[TS] CH07. ARIMA 실습

Time Series
Author

김보람

Published

November 23, 2023

분산이 일정하지 않은 경우

z <- AirPassengers
z
A Time Series: 12 × 12
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
1952 171 180 193 181 183 218 230 242 209 191 172 194
1953 196 196 236 235 229 243 264 272 237 211 180 201
1954 204 188 235 227 234 264 302 293 259 229 203 229
1955 242 233 267 269 270 315 364 347 312 274 237 278
1956 284 277 317 313 318 374 413 405 355 306 271 306
1957 315 301 356 348 355 422 465 467 404 347 305 336
1958 340 318 362 348 363 435 491 505 404 359 310 337
1959 360 342 406 396 420 472 548 559 463 407 362 405
1960 417 391 419 461 472 535 622 606 508 461 390 432
class(z)
'ts'
plot.ts(z, main = "Time series plot for Airpassengers data")

  • 추세, 이분산, 계절성분이 있다.

- Box-Cox transformation

\[f_\lambda(Z_t) = \begin{cases} \dfrac{Z_t^\lambda-1}{\lambda}, & Z_t \geq 0, \lambda>0 \\ log(Z_t), & \lambda=0 \end{cases}\]

forecast::BoxCox.lambda(z, method='loglik')
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
0.2
  • 최적의 람다를 찾아줌
forecast::BoxCox.lambda(z, method = "guerrero")
-0.294715585559316

- ts객체가 들어가야함

forecast::BoxCox(z,lambda= forecast::BoxCox.lambda(z, method='loglik'))
A Time Series: 12 × 12
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 7.847352 7.982144 8.276536 8.215632 8.047493 8.336343 8.583833 8.583833 8.356042 8.004073 7.658338 7.982144
1950 7.915451 8.153584 8.452835 8.336343 8.132639 8.602141 8.965606 8.965606 8.762630 8.296592 7.892911 8.433699
1951 8.528312 8.620350 9.094641 8.848653 8.998313 9.094641 9.412543 9.412543 9.188405 8.831619 8.546920 8.899258
1952 8.981998 9.126173 9.324566 9.141833 9.172949 9.677811 9.835957 9.987634 9.554566 9.294754 8.998313 9.339379
1953 9.368824 9.368824 9.912567 9.899908 9.823034 10.000000 10.250736 10.342064 9.925183 9.582316 9.126173 9.441397
1954 9.484251 9.249564 9.899908 9.797051 9.887205 10.250736 10.666479 10.571969 10.192525 9.823034 9.470023 9.823034
1955 9.987634 9.874459 10.285240 10.308071 10.319435 10.799092 11.262611 11.107788 10.768883 10.364560 9.925183 10.409160
1956 10.475107 10.398058 10.819103 10.778978 10.829071 11.351000 11.678616 11.613495 11.181384 10.707761 10.330766 10.707761
1957 10.799092 10.656090 11.190490 11.117061 11.181384 11.750682 12.078927 12.093594 11.605283 11.107788 10.697481 11.004343
1958 11.042269 10.829071 11.244701 11.117061 11.253666 11.852637 12.265784 12.363141 11.605283 11.217686 10.748614 11.013858
1959 11.226711 11.061098 11.621691 11.538992 11.734774 12.130041 12.649244 12.719537 12.064211 11.629871 11.244701 11.613495
1960 11.710799 11.497014 11.726798 12.049443 12.130041 12.564701 13.102063 13.007960 12.383721 12.049443 11.488567 11.829327
forecast::BoxCox(z,lambda= 'auto')
A Time Series: 12 × 12
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 2.548535 2.561426 2.588461 2.582990 2.567558 2.593773 2.615143 2.615143 2.595510 2.563492 2.529884 2.561426
1950 2.555089 2.577352 2.603953 2.593773 2.575434 2.616686 2.646282 2.646282 2.629992 2.590249 2.552929 2.602296
1951 2.610433 2.618215 2.656336 2.636968 2.648852 2.656336 2.680161 2.680161 2.663501 2.635595 2.612017 2.641022
1952 2.647572 2.658759 2.673698 2.659957 2.662328 2.699069 2.709945 2.720109 2.690390 2.671486 2.648852 2.674793
1953 2.676962 2.676962 2.715111 2.714262 2.709067 2.720927 2.737151 2.742897 2.715955 2.692360 2.658759 2.682259
1954 2.685357 2.668111 2.714262 2.707296 2.713408 2.737151 2.762643 2.756996 2.733443 2.709067 2.684331 2.709067
1955 2.720109 2.712549 2.739332 2.740768 2.741481 2.770427 2.796406 2.787934 2.768668 2.744300 2.715955 2.747066
1956 2.751119 2.746379 2.771588 2.769257 2.772164 2.801154 2.818211 2.814887 2.791987 2.765084 2.742191 2.765084
1957 2.770427 2.762027 2.792485 2.788447 2.791987 2.821853 2.837961 2.838662 2.814466 2.787934 2.764478 2.782161
1958 2.784288 2.772164 2.795437 2.788447 2.795922 2.826939 2.846793 2.851301 2.814466 2.793969 2.767483 2.782696
1959 2.794460 2.785340 2.815307 2.811044 2.821052 2.840400 2.864196 2.867286 2.837255 2.815726 2.795437 2.814887
1960 2.819842 2.808860 2.820650 2.836545 2.840400 2.860440 2.883580 2.879651 2.852247 2.836545 2.808419 2.825783
t <- 1:length(z)

- linear model이 들어가야함

MASS::boxcox(z~t)

  • 가능도함수를 가장 크게 해주는 람다를 찾아줌
bc <- MASS::boxcox(z~t)

bc
$x
  1. -2
  2. -1.95959595959596
  3. -1.91919191919192
  4. -1.87878787878788
  5. -1.83838383838384
  6. -1.7979797979798
  7. -1.75757575757576
  8. -1.71717171717172
  9. -1.67676767676768
  10. -1.63636363636364
  11. -1.5959595959596
  12. -1.55555555555556
  13. -1.51515151515152
  14. -1.47474747474747
  15. -1.43434343434343
  16. -1.39393939393939
  17. -1.35353535353535
  18. -1.31313131313131
  19. -1.27272727272727
  20. -1.23232323232323
  21. -1.19191919191919
  22. -1.15151515151515
  23. -1.11111111111111
  24. -1.07070707070707
  25. -1.03030303030303
  26. -0.98989898989899
  27. -0.949494949494949
  28. -0.909090909090909
  29. -0.868686868686869
  30. -0.828282828282828
  31. -0.787878787878788
  32. -0.747474747474747
  33. -0.707070707070707
  34. -0.666666666666667
  35. -0.626262626262626
  36. -0.585858585858586
  37. -0.545454545454545
  38. -0.505050505050505
  39. -0.464646464646465
  40. -0.424242424242424
  41. -0.383838383838384
  42. -0.343434343434343
  43. -0.303030303030303
  44. -0.262626262626263
  45. -0.222222222222222
  46. -0.181818181818182
  47. -0.141414141414141
  48. -0.101010101010101
  49. -0.0606060606060606
  50. -0.0202020202020201
  51. 0.0202020202020203
  52. 0.060606060606061
  53. 0.101010101010101
  54. 0.141414141414141
  55. 0.181818181818182
  56. 0.222222222222222
  57. 0.262626262626263
  58. 0.303030303030303
  59. 0.343434343434343
  60. 0.383838383838384
  61. 0.424242424242424
  62. 0.464646464646465
  63. 0.505050505050505
  64. 0.545454545454546
  65. 0.585858585858586
  66. 0.626262626262626
  67. 0.666666666666667
  68. 0.707070707070707
  69. 0.747474747474748
  70. 0.787878787878788
  71. 0.828282828282829
  72. 0.868686868686869
  73. 0.909090909090909
  74. 0.94949494949495
  75. 0.98989898989899
  76. 1.03030303030303
  77. 1.07070707070707
  78. 1.11111111111111
  79. 1.15151515151515
  80. 1.19191919191919
  81. 1.23232323232323
  82. 1.27272727272727
  83. 1.31313131313131
  84. 1.35353535353535
  85. 1.39393939393939
  86. 1.43434343434343
  87. 1.47474747474747
  88. 1.51515151515152
  89. 1.55555555555556
  90. 1.5959595959596
  91. 1.63636363636364
  92. 1.67676767676768
  93. 1.71717171717172
  94. 1.75757575757576
  95. 1.7979797979798
  96. 1.83838383838384
  97. 1.87878787878788
  98. 1.91919191919192
  99. 1.95959595959596
  100. 2
$y
  1. -198.799073449348
  2. -195.37039950111
  3. -191.94968992843
  4. -188.537493633138
  5. -185.134338095595
  6. -181.740730731914
  7. -178.357219945539
  8. -174.984497973015
  9. -171.623294248388
  10. -168.274441727843
  11. -164.938858510347
  12. -161.617523276341
  13. -158.311587211365
  14. -155.022246206941
  15. -151.750870096316
  16. -148.498954283398
  17. -145.268092393427
  18. -142.06012344218
  19. -138.876948334787
  20. -135.720727426085
  21. -132.593785460871
  22. -129.498596427508
  23. -126.437961568566
  24. -123.414763422835
  25. -120.432235059637
  26. -117.493804438979
  27. -114.603099888283
  28. -111.764134954901
  29. -108.981017342337
  30. -106.258254630603
  31. -103.600549284161
  32. -101.012815323948
  33. -98.5003233041335
  34. -96.0684242249103
  35. -93.722782423167
  36. -91.4691951449285
  37. -89.3135682143162
  38. -87.2619677902398
  39. -85.3204701215757
  40. -83.4951346102045
  41. -81.7920100099495
  42. -80.2169681784438
  43. -78.7756511898379
  44. -77.4735695142388
  45. -76.315650880116
  46. -75.3066206646679
  47. -74.4506111638586
  48. -73.7510789691647
  49. -73.2111844477151
  50. -72.8329692383655
  51. -72.6181345825874
  52. -72.5674858952699
  53. -72.6809327642697
  54. -72.9580133738323
  55. -73.3970321104162
  56. -73.9959619454446
  57. -74.7519438462799
  58. -75.6613884991687
  59. -76.7204216050323
  60. -77.9243367495376
  61. -79.2682324587806
  62. -80.7467723383356
  63. -82.3542850745746
  64. -84.0850062620454
  65. -85.9329380987263
  66. -87.892044167354
  67. -89.9563047616737
  68. -92.1197117342045
  69. -94.3763431162309
  70. -96.7204827466603
  71. -99.1464739467718
  72. -101.648962801429
  73. -104.222775173309
  74. -106.862922356751
  75. -109.564796705531
  76. -112.323883247451
  77. -115.136059766643
  78. -117.997408428711
  79. -120.904219103619
  80. -123.853155375878
  81. -126.840969011805
  82. -129.864771826518
  83. -132.921839900135
  84. -136.009634355363
  85. -139.125908387669
  86. -142.268485619794
  87. -145.435470690635
  88. -148.625080760812
  89. -151.835678635438
  90. -155.065828872171
  91. -158.3141482545
  92. -161.579455664176
  93. -164.860640976599
  94. -168.156694471724
  95. -171.466728508306
  96. -174.78989955717
  97. -178.125534851951
  98. -181.473010287635
  99. -184.83167793835
  100. -188.200864446038
which.max(bc$y)
52
bc$x[which.max(bc$y)]
0.060606060606061
z_boxcox <- forecast::BoxCox(z, lambda=bc$x[which.max(bc$y)])
par(mfrow=c(2,2))
plot.ts(z, main = "Original")
plot.ts(log(z), main = "log(z)")
plot.ts(sqrt(z),, main = "sqrt(z)")
plot.ts(z_boxcox, main = "Boxcox_lambda=0.06")
graphics.off()

  • 로그변환을 수행한다.

확률적 추세 제거

- 확률보행과정의 정상화

\(Z_t = Z_{t−1} + ϵ_t, ϵ_t ∼ WN(0, σ^2)\) - AR(1) with \(ϕ = 1\)

e <- round(rnorm(10),2)
z <- cumsum(e)
plot.ts(z)

e <- round(rnorm(1000),2)
z <- cumsum(e)
forecast::tsdisplay(z, lag.max=24)

  • 추세라기 보단 분산이 커짐.. —> 확률적 추세

  • ACF가 아주 천천히 감소하므로 확률적 추세가 있다고 할 수 있다

  • 1차 차분을 수행한다

\(ΔZ_t = Z_t − Z_{t−1} = ϵ_t\)

  • 차분한 확률보행과정의 시도표 및 ACF/PACF 그림

- diff:차분함수

예시

a <- c(3,7,2,9)
diff(a)
  1. 4
  2. -5
  3. 7

4: 7-3 : a[2]-a[1]

-5: 2-7 : a[3] - a[2]

7: 9-2 : a[4] - a[3]

diff(a,2)
  1. -1
  2. 2

-1: 2-3 : a[3] - a[1]

2: 9-7: a[4] - a[2]

options(repr.plot.width = 12, repr.plot.height = 10)
forecast::tsdisplay(diff(z), lag.max=24)

  • 차분을 했더니 정상 시계열이 됨.

\(ΔZ_t ∼ WN = ARMA(0, 0) ⇒ Z_t ∼ ARIMA(0, 1, 0)\)

  • 로그변환한 Airpassengers data 에 대한 확률적 추세 확인
logz <- log(AirPassengers)
options(repr.plot.width = 12, repr.plot.height = 6)
plot.ts(logz)

  • 추세가 있다. 결정적 추세로 해도 될거 같기도 하고…
par(mfrow=c(1,2))
acf(logz)
pacf(logz)

  • 계절성분 떄문에 acf그래프가 중간에 내려오다가 볼록 튀어 나온 것
acf(logz, lag.max=72)

차분을 진행한다.

\(ΔZ_t = (1 − B)Z_t = Z_t − Z_{t−1}\)

d_log_z = diff(logz)
plot.ts(d_log_z)

forecast::tsdisplay(d_log_z, lag.max=36)

여전히 계절성분이 남아있으므로, 계절차분 진행

\(Δ_{12}ΔZ_t = (1 − B^{12})(1 − B)Z_t = (1 − B^{12})(Z_t − Z_{t−1}) = Z_t − Z_{t−1} − Z_{t−12} − Z_{t−1}\)

ds_d_log_z <- diff(d_log_z, 12)
plot.ts(ds_d_log_z)

forecast::tsdisplay(ds_d_log_z, lag.max=36)

  • 하지만, 일반적으로 계절성분과 추세가 동시에 있는 경우 계절차분을 먼저 진행한다
ds_log_z <- diff(logz, 12)
ts.plot(ds_log_z)

forecast::tsdisplay(ds_log_z, lag.max=36)

확률적추세가 있어 보이지만, AR 모형에서 ACF가 감소하는 형태라고 할 수도 있다. 이 경우에는 추세를 제거하기 위한 차분을 더 진행하는 것을 결정하는 것이 어렵다.

depart data

z <-scan("depart.txt")
plot.ts(z)

log_z <- log(z)
forecast::tsdisplay(log_z, lag.max=36)

계절차분을 먼저 진행

ds_log_z <- diff(log_z, 12)
forecast::tsdisplay(ds_log_z, lag.max=36)

  • 이 데이터 시도표 상으로는 확률적인 추세가 있어보이지만, 역시 ACF만 보고 차분이 필 요한 가에 대한 결정을 하는 것이 쉽지 않다.

여러가지 Simulation

n <- 200
x = arima.sim(n=n, list(order=c(1,0,0), ar=0.5)) #AR(1)
z = arima.sim(n=n, list(order=c(1,1,0), ar=0.5)) #ARIMA(1,1,0)
forecast::tsdisplay(x, lag.max=36, main = "AR(1)")

forecast::tsdisplay(z, lag.max=36, main = "ARIMA(1,1,0)")

  • acf가 지수적으로 감소하고 있다. 차분이 필요하다.

  • ARIMA(1,1,0)모형은 차분하면 AR(1)모형이 된다.

forecast::tsdisplay(diff(z), lag.max=36, main = "(1-B)z")

n <- 200
x = arima.sim(n=n, list(order=c(0,0,1), ma=-0.8)) #MA(1)
z = arima.sim(n=n, list(order=c(0,1,1), ma=-0.8)) #ARIMA(0,1,1)
forecast::tsdisplay(x, lag.max=36, main = "MA(1)")

forecast::tsdisplay(z, lag.max=36, main = "ARIMA(0,1,1)")

forecast::tsdisplay(diff(z), lag.max=36, main = "(1-B)z")

n <- 200
x = arima.sim(n=n, list(order=c(1,0,1), ar=0.5, ma=-0.8)) #ARMA(1,1)
z = arima.sim(n=n, list(order=c(1,1,1), ar=0.5, ma=-0.8)) #ARIMA(1,1,1)
forecast::tsdisplay(x, lag.max=36, main = "ARMA(1,1)")

forecast::tsdisplay(z, lag.max=36, main = "ARIMA(1,1,1)")

forecast::tsdisplay(diff(z), lag.max=36, main = "(1-B)z")

과대차분

## ARIMA(0,1,1)
z = arima.sim(n=1000, list(order=c(0,1,1), ma=0.5))
forecast::tsdisplay(z, lag.max=36, main = "ARIMA(0,1,1)")

#한 번 차분
d_z <- diff(z)
forecast::tsdisplay(d_z, lag.max=36, main = "ARIMA(0,1,1)")

# 한 번 더 차분
d2_z <- diff(d_z)
forecast::tsdisplay(d2_z, lag.max=36, main = "ARIMA(0,1,1)")

  • 한 번 차분을 하면 정상시계열이 된다. 그리고 한 번 더 차분해도 마찬가지로 정상시계 열이다
sd(z)
sd(diff(z))
sd(diff(diff(z)))
8.72895467185367
1.18251197758341
1.2484612507557
  • 차분을 두번 하고 나면 분산이 커진다. 이를 과대차분이라고 한다